Bilingual Dictionary Construction with Transliteration Filtering
نویسندگان
چکیده
In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine translation approach. We demonstrate that the extracted dictionary is accurate and of high recall (F1-score 0.8). Our lexicon contains not only single words but also multi-word expressions, and is freely available. Our experiments focus on Katakana-English lexicon construction, however it would be possible to apply the proposed methods to transliteration extraction for a variety of language pairs.
منابع مشابه
English-Chinese Transliteration Word Pair Extraction from Parallel Corpora
Bilingual dictionary construction is a time-consuming job; therefore many studies have recently focused on automatically constructing bilingual dictionaries from bilingual texts. In this paper, we propose two novel approaches called dynamic window and tokenizer based on statistical machine transliteration model to efficiently extract English-Chinese transliteration pairs from parallel corpora. ...
متن کاملExtracting English-Korean Transliteration Equivalence from Domain-Specific Dictionaries
Automatic translation knowledge acquisition or automatic bilingual dictionary construction has become an important first step for natural language applications such as machine translation and cross-language information retrieval. Transliterations are used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from E...
متن کاملA General Method for Creating a Bilingual Transliteration Dictionary
Transliteration is the rendering in one language of terms from another language (and, possibly, another writing system), approximating spelling and/or phonetic equivalents between the two languages. A transliteration dictionary is a crucial resource for a variety of natural language applications, most notably machine translation. We describe a general method for creating bilingual transliterati...
متن کاملEffective Transliteration
The translation of texts written in different languages is required in many domains, such as machine translation and cross-lingual information retrieval. Translating words of a text from a source language into a different target language can be efficiently achieved using a bilingual vocabulary, where every source word has a counterpart in the target language. In practice, however, there are oft...
متن کاملRobust Transliteration Mining from Comparable Corpora with Bilingual Topic Models
We present a high-precision, languageindependent transliteration framework applicable to bilingual lexicon extraction. Our approach is to employ a bilingual topic model to enhance the output of a state-of-the-art graphemebased transliteration baseline. We demonstrate that this method is able to extract a high-quality bilingual lexicon from a comparable corpus, and we extend the topic model to p...
متن کامل